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Abstract 

In structural pattern recognition, given a set of graphs, the com- 
putation of a GeneraUzed Median Graph is a well known problem. 
Some methods approach the problem by assuming a relation between 
the Generalized Median Graph and the Common Labelling problem. 
O However, this relation has still not been formally proved. In this pa- 

per, we analyse such relation between both problems. The main result 
I proves that the cost of the common labelling upper-bounds the cost of 

J> the median with respect to the given set. In addition, we show that 

the two problems are equivalent in some cases. 

(N 

1 Introduction 

O 

In many pattern recognition applications, we are given a set of different 
T— I representations of the same object and the goal is to summarize these repre- 

^ sentations into a single one. The resulting representation should capture the 

important features of the object and discard noisy or unexpected variations. 
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When the representation is made using attributed graphs, this graph is iden- 
tified as the Generahzed Median Graph [T], or simply the Median Graph. 
Given a training set of graphs, the Median Graph is formally defined as a 
graph which minimizes the sum of costs to all other graphs in the set. 

If we assume that vertices are not uniquely labelled, like in [2], the prob- 
lem of finding the Median Graph is, in its general form, at least as difficult 
as the problem of matching two graphs under a particular cost function, e.g. 
the Graph Edit Distance, which is a NP-Hard problem ^3j. Indeed, the Me- 
dian Graph cannot be computed in closed form since its synthesis depends 
on the matchings between itself and the given graphs and the matchings 
to the Median Graph clearly require having the Median Graph. A usual 
way to deal with this chicken-egg problem is using an incremental approach 
where the Median Graph is coarsely constructed and then iteratively refined 
until all graphs in the training set are considered. Several approaches ad- 
dress the problem in this fashion [H 13 |6l [71 [U [9] . A completely different 
approach to compute the Median Graph is to decouple the matchings and 
the synthesis process. This approach relies on the assumption that given 
the vertex labellings that compute the Median Graph, its computation can 
be, in most applications, done efficiently in polynomial time, e.g. averag- 
ing the vertices and edge attributes. This approach can be summarized in 
two steps. In the first step, we obtain a Common Labelling between the 
given graphs. The objective of the Common Labelling, initially defined in 
\10\ [TT] , is to minimize the pair-wise labellings among a set of graphs with 
some transitivity restrictions. Once we know this information, we can easily 
compute an Approximated Median Graph. Figure [T] illustrates the complete 
process to generate a Median Graph using a Common Labelling. Note the 
given set of graphs is labelled to a virtual node set and Median Graph is 
not computed until the end of the process. The main advantage of using a 
Common Labelling approach for approximating the Median Graph relies on 
the fact that the Median Graph does not need to be computed until the end 
of the process. In this way, labellings of the initial graphs to the Median 
Graph are not needed and the initial chicken-egg problem disappears. 

Several works exist in the literature which decouple the problem of the 
Median Graph computation. The first method to completely decouple the 
matching process from the synthesis process was presented by Hlaoui and 
Wang |7j. Another recent method, based on linear programming, has been 
proposed in [12] and [13]. But possibly the most complete work on these 
kind of methods is presented in [H]. Experiments in [Hj show that using 
the Common Labelling for computing the Median Graph gives satisfactory 
results, but up to now a formal relation between the Common Labelling 
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problem and the Median Graph synthesis was missing. In this work, we 
show that, if the cost for matching graphs is a metric, the two problems 
are tightly connected because we can bound the Median Graph error using 
the Common Labelling value. The obtained bounds show that, when the 
error of the Common Labelling is low, the obtained graph median is close 
to the real one. In addition, in the specific case of unattributed graphs 
with the squared Euclidean distance as cost function, the two problems are 
equivalent. 




Figure 1: The process for computing an Approximated Median Graph with 
the Common Labelling. After representing the given objects, which in this 
figure are sketches of electrical circuits, with attributed graphs we look for 
labelling the nodes of each graph. The virtual set of nodes does not have 
structure and is used to compare the labelling for each graph and evaluate 
their pairwise matching cost. After choosing a labelling for each graph we 
convert the virtual set of nodes into the actual graph prototype. See section 
[3] for a formal definition of the Common Labelling problem. 
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2 Definitions 



Let Ti he a set of attributed graphs representing the input/output space of 
our problems. Each graph is represented as a tuple G = {V,E,Av,Ae), 
where V = {vi, .■■,Vn} represents the vertex set, E C {ea^fe,Va,6 G l..n} the 
edge set, and functions Ay ■ V — ?• Dy and Ae ■ E ^ De assign attributes 
to vertex and edges respectively. 

Given a set of m attributed graphs S = {Gi, Gm}, Gi = (Vi, Ei, Av,Ae) G 
Ti, we assume that each of these graphs have the same number of vertices 
n. If this is not the case, several solutions have been proposed to extend 
the size of the graphs [9l [15]. However, the most common approach is to 
include null vertices |15j which represent deletions and insertion of vertices 
in the resulting labelling. In the general graph matching setting, vertices of 
each graph are not uniquely identified by their index, i.e. we cannot assign 
or identify ^3 S Vi with V3 S V2 only because they have the same index 
3. Indeed, the difficult part of comparing a pair of graphs relies in finding 
a suitable bijection vr of vertices which provides the right ordering. In the 
following, given a bijection vr, the notation G'^ means that Vi{iT) = v^i^^^, so 
that vertices of V and edges E are permuted accordingly to n. The bijection 
id G H represents the identity Vi{id) = Vi. Figure [2] (a) shows how graphs 
are permuted to a common reference system with permutations vrj and pi. 
The function c: T-LxUxT-LxIl^ is a user-defined cost between two 
graphs whose vertices have a fixed bijection. We assume that c(-, •, •, •) can 
be computed efficiently in polynomial time because the vertex to vertex cor- 
respondence is fixed, and consequently also the edge to edge correspondence 
and their attributes. We use the shorthand c{G^\Gj^) = c{Gi,iTi,Gj,TTj) 
and c{Gi,Gj) = c{Gi,id,Gj,id). If the cost function is a metric we denote 
it as Cm in this case, given a fixed set of bijections TTi^,,,^rn, the following 
axioms hold: 

identity c^f (G^, Gj) = 04^Gi = Gj, 
positivity CM{Gi,Gj) > 0, 
symmetry CM{G^,Gj) = CM{Gj,Gj), 

triangle inequality CAf(G'j,G^.) < ca/(G.,G^) + cm{G,,,G-). 

We define the distance d between two graph as the minimum cost among 
all possible bijections of attributes in vertices and edges. That is. 
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d{Gi,G2):= min cm{GI\GI^) 
VTi , 7r2 G n 



(1) 



Given a set of graphs S = {Gi, ...,Gm) ^ T~{-, the Generahzed Median 
Graph [1] is defined as a graph G* , taken from the set Ti, which minimizes 
the average sum of costs to aU graphs in S: 

^ m 

GM* in) := min -^c(Gf,G) (2) 

Gen 

If not exphcitly stated the argument of GM* isT-L. In the fohowing, and 
as Figure [2] (a) shows, we will denote with pi the permutations which obtain 
the Median Graph. 



3 The Common Labelling Problem 

Given a set of graphs S = {Gi, Gm) ^ the Common Labelling problem 
aims at finding a, possibly low cost, consistent multiple isomorphism between 
the graphs, such that for every three mappings TTi,j,'^j,r and vTj^r we have 
vTjj o TTj^j. = m^r- Equivalently, we look for m consistent bijections assigning 
vertices of the graph of a virtual vertex set and that minimize the average 
sum of pairwise distances between graphs in S. Its normalized objective 
function is the following: 

^ m m 

CL*:= min EE , G]^ ) (3) 

TTl, . . . ,7rm G n i=i j = i 

Once the Common Labelling and the m bijections iTi^...^m that computes 
the value are obtained, we assume that we can efficiently estimate a median 
graph G: 

m 

G G argmin ^ c(Gf , G) (4) 
Gen ^=l 

which we call Approximated Median Graph. In the following, and as Figure 
[2] (a) shows, we will denote with vrj the permutations which obtain the 
Approximated Median Graph through the Common Labelling. 
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4 Relating the Common Labelling with the Gen- 
eralized Median Graph 

In this section, we show two main results of this work. The first theorem 
shows the relationship between the objective function of the Common La- 
belling, CL*, and the objective function of the Median Graph, CM*. The 
second theorem shows that, if the functional of the Common Labelling CL* 
has a low value, the Approximated Median Graph G is close to the Median 
Graph G* . 

Theorem 1. Let % he a set of graphs and S = {Gi, . . . , Gm} a subset ofH. 
In addition, let G he the Approximated Median Graph computed considering 

5 and the hijections obtained by the Common Labelling, tti^, ,,^rn- Let the cost 
function cm be a metric. Then 

CL* > GM*({G}) > GM* > ^CL* (5) 
Proof We start with the left hand side of ([5]): 

mm 

i=i j=i 

m m 



^ m ^ ' 

m ^-^ ■' 

i=i _ 

= GM*({G}) 
> GM* 

The second step comes from optimality of the Approximated Median Graph, 
see 
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The right hand side of ([s]) follows from: 

^ m 

GM* = _VcA/(Gf,G*) 

1=1 

^ mm 

i=l j=l 

^ mm 



i=i j=i 

m m 



1=1 j=i 



-CL* 



The third step uses the triangle inequality and the forth step comes from 



considering the optimality of vTj and ttj 



□ 



i / 

"•I 

I / 
I / 

♦ / 



(a) 




(b) 



Figure 2: (a) Notation for Theorems [T] and [2| (b) Graphical representation 
of Q, which is the basic inequality for proving Theorem [2| 

Theorem 2. Let % he a set of graphs and S = {Gi, . . . ,Gm} be a sub- 
set of %. In addition, let G he the Approximated Median Graph computed 
considering S and G* the Generalized Median Graph. Then, 



d{G,G*) < 2CL* < 4GM* 



(8) 
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Proof. Let 7ri....^m be the bijections obtained by the Common Labehing and 
Pi,...,m the bijections related to G* and cm a metric cost function. Since cm 
is a metric we have for each single graph G^: 



cm{G*,G) < CM{G*,GiP')+CM{G,GiP^) _ 

< cm{G\G/') + cm(G/% Gi'^O + cm{G, Gn 

since pi and tTj may be different CM{Gi^\ Gi^"-) ^ 0. However, applying bijec- 
tion TT'^pi to Gi"^ and G costs are preserved ca/ (G, Gj"') = cm{G'^' ^\Gi''^) 
and CM{G/\Gi'^''^^ ^P') = 0. This reasoning is visualized in Figure [2] (b). 
Hence, 

cm(G"'"'^\G*) < CM(G"-'''\Gf) + CAf(Gf%G*) • (10) 



In ( 10 ), vertices and edges of Gi, . . . , Gm and G have been permuted accord- 
ingly to G* . To ease notation, assume that tTj correspond to the identity. 
Consequently, 

diG,G*) < CMiG'-^G*) < CM(G''\Gf ) + CM(Gf ,G*). (11) 



Then, adding inequality (11) for the different Gj's we get: 

d(G,G*) < l^-^CA/(G^\Gf) + CM(Gf,G*) 

= GM*({G}) + GM* (12) 
< 2CL* 

□ 

A desirable output for the user is that the Approximated Median Graph 
is an e approximation of the given objects. The following corollary shows 
that, in this case, this Approximated Median Graph is close to the actual 
Median Graph. 

Corollary 1. Let S = {Gi, . . . , Gm} admit an Approximated Median Graph 
G such that GM*({G}) < e. Then d(G,G*) < 3e. 

The proof is based on equation ( 12 ) of theorem[2]and is left to the reader. 
Theorem [1] and [2] are proven considering the optimal computation of CL*. If 
we relax this assumption with a suboptimal computation we get the following 
corollary. 
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Corollary 2. Let % he a set of graphs and S = {Gi, . . . , Gm} be a subset 
of %. In addition, let G' he the Approximated Median Graph computed 
considering S and the, possibly suhoptimal, hijections obtained by Common 
Labelling whose value is CL. Let the cost function cm be a metric. Then 
CL > GM(G^) > GM* and diC" , G*) < 2CL. 

5 Median Graph of Weighted Graphs 

Clearly, the notion of Median Graph can be used with a large set of dif- 
ferent cost functions. In this section, we will show how using the original 
proposed cost [Ij between graphs and restricting to weighted graphs, the Me- 
dian Graph problem reduces exactly to the Common Labelling problem. Let 
Ay : y — )• [0, 1] and Ae ■ E ^ [0,1] he the domain of vertices and edges at- 
tributes. In this case, the value "1" indicates that the graph vertex, or edge, 
exists and value "0" that the vertex, or edge, does not exist. We use a vec- 
tor/matrix representation, so that Vi(r) = Av{vr) and Ei(r, s) = Asicr^s) 
where Vr £ Vi and er,s G Ei and bijections vrj are represented as permuta- 
tion matrices Pi. In case no vertex position is indicated, V;, we refer to the 
complete vector. 

As cost function we use the squared Euclidean distance, c{vr,Vs) = 
||Vi(r) — Vj(s)|p where Vr S Vi and Vg S Vj. The edge cost function is 
defined in an equivalent form. This cost was also used in the genetic algo- 
rithm of [T] where authors proved the best prototype for a set of graphs, 
with fixed labellings, is the average of attributes 



V(r) = -X;Vk(r) 

k=l 
^ m 

E(r,s) = -VEk(r,s) 

m ^ — ' 



m 

k=i (13) 



m 

k=l 

Under these considerations, we can state the following theorem: 

Theorem 3. Let % he a set of weighted graphs, S = {Gi, . . . , Gm} o, given 
subset ofH. and pi^^^^^m G R^^^ m permutation matrices. Considering the 
cost given by the squared Euclidean distance, we have: 

^CL* = GM*. (14) 
Proof. The scalar product of two vectors is: 
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(Vi,Vj) = ^Vi(r)Vj(r). 



■r=l 



The proof follows the lines of the Huygens theorem [in] • 



^ mm 

^CL* = wEEllPiVill'-2(PiVi,PjVj) + ||pjVjf 

i=i j=i 

^ m m 

= ;;^EEiiPiViii'-(piVi>PjVj> 

i=i j=i 

= E iiPiViii' + E E -^2ip-^^^^ pjVj) + ^(PiVi,pjVj) 

i=l 1=1 j=l 

^ m „ m 

= - ^ (iiPiVif - -(piVi, PjVj)) + (- E pi^i' - E pjV. 

i=l j=l 1=1 j=l 

= - VllPiVif -2(piVi,V) + ||Vf 



i=l 

> GM* 

(15) 

The converse inequality is similarly proved and the process is equivalent 
for the edge costs. □ 

As an immediate consequence of theorem |3] we have that the Approxi- 
mated Median Graph error is the same as the Generalized Median Graph. 
The proof is based on theorem |3] and is left to the reader. 

Corollary 3. Under the hypothesis of theorem^ we have: 

GM*({G}) = GM* (16) 

By exploiting the particular properties of the squared Euclidean distance, 
which is not a metric, we get a much stronger result than theorem [T| 

6 Discussion 

In this paper we analysed the relation between two structural pattern recog- 
nition problems, the Median Graph and the Common Labelling. We proved 
that these problems are closely related and in some special cases they are 
in fact equivalent, thereby formalising a connection which up to now was 
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unknown. This connection confirms that algorithms based on the Com- 
mon Labelhng, to compute the Median Graph, are theoretically sound. In 
addition, the proposed bounds are useful in practice, when the Common 
Labelling is computed using non-exact algorithms, like in 
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